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Abstract 


Language teaching is a difficult process that requires careful work. Educators try to find 
ways to make this difficult process enjoyable for language learners. As technological 
developments came into use, language learning became more attractive. 'Text to Speech’ 
(Speech synthesis) technology, one of these technological developments, is basically the 
process of synthesizing natural sounding speech from any text via special computer 
programs. These programs can create the most realistic, human-sounding synthetic speech 
which is available today. This paper explores the possible uses, advantages and 
disadvantages of this technology taking EFL learners into consideration. 
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Introduction 


Language teaching is rather difficult and complicated process that requires careful and 
diligent work. Educators in the field of language teaching always try hard to find ways to 
make language learning enjoyable and attractive for the learners. Different activities, 
games, and interesting stories helped language teachers to achieve this aim through many 
years and they still do. 

Today, we have access to many CALL programs that are currently used and tested in 
language classrooms for teaching grammar, speaking and other skills. Text-to-speech 
technology is a common feature of almost any CALL application. 'Text-to-Speech' (Speech 


Synthesis) technology is the ability of a computer to produce 'spoken words'. Computer 
speech can be produced either by "splicing" prerecorded words together or, with much 
more difficulty, by having the computer produce the sounds that make up spoken words 
(Microsoft Encarta Encyclopedia Deluxe, 2004) 

In other words, text-to-speech is the conversion of text to speech through special computer 
applications, is often referred to as Text To Speech software (TTS). Text-to-speech software 
is invaluable for blind computer users as it enables them to "read" from the screen. This 
technology was first introduced as Texas Instruments Speak and Spell handheld electronic 
learning aid in 1978. Language learners and teachers need to be infonned about this 
technology, its possible uses, advantages and limitations since this technology is new to 
them. 


Language Learning and Technology: A brief review of history 


As Warschauer and Meskill (2000) suggest, every type of language teaching uses its own 
techniques to help learners. With the introduction of Grammar-translation method, the 
blackboard came into use in language classrooms. Later it was replaced by overhead 
projector. Following them, computer software was used to provide students with drill-and- 
practice exercises. 

The first use of computers by institutions related to teaching and learning coincided with 
the introduction of second-generation computers towards the 1950s. Large universities 
started to use computers for administrative processes and student record keeping. At the 
same time computers were used for instructional teaching and research. PLATO 
(Programmed Logic for Automatic Teaching Operations), the very first project related to 
use of computers in educational research, began in 1960 at the University of Illinois to 
design a large computer-based system for instruction. The PLATO system included a 
mainframe machine supporting hundreds of terminals which have high capacity comparing 
to that age. Many courses in many disciplines were developed, designed and delivered on 
PLATO systems (Alessi & Trollip, 1985; Warschauer, 1996; Levy, 1997; Culley, 1992). 
Later, new versions of PLATO came into use with new changes to provide interactive and 
self-paced instruction. 

During the 1960s and 1970s, the use of computer-assisted instruction expanded in public 
schools with the introduction of the next generation of computers and microchips which 
were cheaper (Bullough & Beatty, 1991). In 1971, another important project, TICCIT 
(Time-shared, Interactive, Computer Controlled Information Television) was initiated at 
Brigham Young University (Levy, 1977). The system combined television technology with 
the computer to deliver instruction to the learners. 

During the 1980s, microcomputers started to be adopted by the schools and new 
developments such as CD-ROM, speech-based software, and interactive videos appeared. 
Also experiments were done in the integration of the computers into the curriculum. In the 
1990s and 2000s, with the introduction of fast, affordable processors, new software, wide- 
scale and fast access to the Internet made computers available in almost all public and 
private schools as well as homes for personal and educational use. 




Meanwhile, what went unnoticed was the ’text-to-speech’ technology basically designed for 
the visually impaired people. Speech synthesis is the conversion of text to speech through 
special computer applications, is often referred to as Text To Speech software (TTS). Text- 
to-speech software is considered invaluable for the blind since it enables them to read from 
the computer screens. However, it didn't take much attention from language learners and 
teachers. This might be attributed to the views on this new technology as Higgins (as cited 
in Ehsani & Knodt, 1998, p.46) states "Because speech technology isn’t perfect, it is of no 
use at all. If it cannot account for the full complexity of human language', why even bother 
modeling more constrained aspects of language use." 

Although what Higgins said cannot be confronted since speech technology is not perfect in 
terms of the complexity of human language, it is important to note that it has some possible 
uses in language teaching and learning. We also should take into account that technology 
improves day by day and it is no doubt that what is good today will be better tomorrow as 
Ehsani and Knodt (1998) and Sobkowiak (2003) stated that text-to-speech technology will 
be a common feature of any CALL application and human language technologies will 
improve the current software of foreign language teaching. 

(Please refer to Audio5 for the early TTS sound technology). 


'Text to Speech' Computer Applications 


Currently, there are three computer applications available to home users who want to 
benefit from this technology, namely, Natural Voice Reader, ReadPlease Plus 2003 and 
TextAloud MP3. All these programs are aimed to produce the most realistic human 
sounding voices. However, Garrett (1998, p. 81) states "This technology isn't at a stage 
where it can reliably render a target language accent authentic enough for language use." 
Before dealing with the performance issue, the first question to be answered is what these 
applications can do. 


What these applications can do 

In general, these applications using 'Text to Speech’ technology can 

• Read any text in computer (web pages, word documents, rich texts, e-mails, news 
articles, online books etc.) 

• Give the option of reading any text and saving it to a file in the form of wav or mp3 
files, which gives the opportunity to listen to them later in your MP3 or CD player. 

• Read any text at any speed and any speaking quality. 

• Read any text using the voice or any accent (male, female, British English American 
English, etc.) 


Performance consideration 


The most important consideration perhaps is whether this technology can create authentic 
speech. In other words, will the speech produced be authentic enough? To understand this, 
Natural Voice Reader Enterprise Edition having AT&T Mike and Crystal American English 
Voices were tested and used to create human-sounding versions of a mini dialogue and a 
long text (see Appendices) from TOEFL Test Preparation Kit published by Educational 
Testing Service (ETS) in 1995 (Audio2 and Audio3). The original speech files are Audio 1 
and Audio4). When compared, it was noticed that the resulting voices were satisfactory in 
terms of pronunciation and clearness, however; some limitations were noted. 

Taking language learners into consideration, the following list can be made regarding the 
uses of these programs: 


Advantages 

• You can listen to any text and any topic (Most EFL listening materials cover a limited 
range of topics and some of them are rather expensive. 

• You can adjust the speed of reading according to your own needs. 

• You can create audio versions from any text (wav or mp3 files). 

• You can create pronunciation exercises for yourself (A single word can also be read.) 

• You can create mini dialogues (changing speakers at run times is possible). 


Limitations 


• Although these programs can create realistic, human-sounding voices, there is always 
a difference in terms of intonation and stress. In other words, it still lacks the 
complexity of naturally occurring speech, resulting in ’dead' sound having no 
emotions. This is easily identified while these programs are reading rather longer 
sentences (compare Audio 1 with Audio2 for a mini dialogue and compare Audio3 and 
Audio4 for a long text). It should be noted that there is no limit to the technological 
advances and acquiring the complexity of naturally occurring speech may be possible 
in near future. 

• These programs require newer and faster computers and enough hard disk space to 
run. (Operation system: windows 98/Me/NT/2000/XP, Processor: 500 MHz, Memory: 
128 MB memory, Disk Space required: 500MB +600 MB for each voice.). However, 
today new computers are highly capable of what is required.) 


Examples of how this technology can be used 


1. Creating a list of frequently-mispronounced words 


Language teachers and learners can create a list of frequently-mispronounced words 
and save this list as a "wav" for later use. Learners can listen to these words and repeat 


while the file is placed. Below is an example of a possible list (Audio5): 

Foreign 

Interesting 

Determine 

Occurrence 

Preface 

Comparable 

Compare 

Comparison 

Carriage 

Marriage 

Natural 

Mature 

Iron 

Capable 

Business 

Major 

Subtle 

Impotent 

Suitable 

Support 


2. Writing short sentences and listening 


Language learners can short sentences to the extent that their imagination allows and 
listen to these dialogues. In this way, this process can be made enjoyable and 
fascinating. Below are possible short sentences that can be created (Audio6) 

Excuse me, Where is the nearest post office please? 

What kind of books do you read? 

What kind of music do you like? 

What do you do when you are bored? 


3. Creating short dialogues 


Language learners can also write dialogues while changing the speakers of the 
programs. Below is possible dialogue that can be created (Audio7) 

Excuse me, Where is the nearest lost property office, please? 

Em sorry, I don 't know. 

Thank you anyway. 

Not at all. 


Reading and listening newspapers online 


Language learners can also read newspapers online and save them as wav fdes for later use. 
Below is an online article which was saved as a sound fde (Audio8) 

"A 28-year-old South Korean man has died after playing an online computer game for 
almost 50 hours non-stop. The man, known only by his family name of Lee, started playing 
the popular battle simulation game Starcraft on August 3 and was fixed to his seat for over 
two days. His marathon gaming session was apparently broken only with the occasional 
toilet break or five-minute nap. Reuters News Agency reports police sources saying the man 
died from cardiac arrest "stemming from exhaustion ". 

Lee was on a mission to become a professional gamer. This is an increasingly attractive 
and well-paid profession in South Korea. Top players can earn substantial amounts of 
money each year. Lee had recently been fired from his job because of absences due to his 
obsession with gaming. The dangers of being addicted to fantasy games are resulting in 
many social problems. In particular, MMORPGs, or massively multiplayer online role 
playing games, keep thousands of players glued to their screens for many hours. " 


Conclusion 


'Text to Speech’ (Speech Synthesis) technology has improved a lot and it is ready to be 
deployed in language learning provided that its limitations are taken into consideration. If 
instructors are trying to expose students to natural language audio input and 
'comprehensible input' (Krashen, 1985) as much as possible, this technology can provide a 
valuable way of doing it provided that its limitations are fully understood and as Ehsani & 
Knodt stated "it is used in ways that workaround these limitations. 


Recommendations 


Based on the discussions made in this paper, the following recommendations have been 
made: 

• 'Text to Speech’ (Speech Synthesis) technology is ready to be deployed in the second 
language education and instructors should be willing to explore possible uses of this 
technology having its limitations into considerations. 

• Experimental studies are needed to fully understand the possible uses/effects of this 
technology in language learning situations. Also, language learners' views and needs 
on the use of this technology will be beneficial in directing the future development of 
this technology. 
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Sites of interest to readers 


http://www.naturalreaders.com 

(Natural Voice Text-to-speech Reader software. You can have your computer read 
documents aloud, using high quality Natural voice. With Build-in web browser, you 
can view any web news in the Internet, and have the computer to read any part of the 
news, weathercast, charting messages, and emails. The application can read word 
documents, rich text hies, and PDF hies.) 
http://www.readplease.com 


(TextAloud MP3 lets you listen to text you copy to the clipboard. It uses 'Text to 
Speech’ technology which actually synthesizes human sounding speech from ordinary 
text.) 

http ://www.nextup .com 

(ReadPlease Plus 2003 will read any text you see on your screen. This can be from 
your Browser, Email, Word processor, Spreadsheet or any program which displays 
text.) 

http://vlc.polyu.edu.hk 

TTS resources at the Virtual Language Centre of the Polytechnic University of Hong 
Kong. 

http://www.gutenberg.net 

(Project Gutenberg, the brainchild of Michael Hart, is an excellent source of a lot of 
famous and important texts which are in plain text format. The computer applications 
above can read these texts.) 
http://www.pcww.com 

Winspeech: a TTS program, 
http ://www. freedomscientific .com 

JAWS for Windows: A remarkable TTS tool. 
http://elsap 1 .unicaen.fr/KaliDemo.htm 

KALI, a demonstration TTS package (Trench) from the University of Caen, 
http ://www. rhetoric al.com 

a TTS interactive demo (male and female voices speaking with American, British, 
Scottish, and Australian etc). 
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APPENDICES 


Appendix A- Script for Audio 1 & Audio2 

Woman : Do you know anyone who can translate this document? 

Man : What about the new secretary? I heard he's bilingual. 

Appendix B- Script for Audio3 and Audio4 

(Man) Before we begin our tour, I’d like to give you some background information on the 
painter Grant Wood- we'll be seeing much of his work today. 

Wood was born in 1881 in Iowa farm country, and became interested in art very early in 
life. Although he studied art in both Minneapolis and at the Art Institute of Chicago, the 


strongest influences on his art were European. He spent time in both Gennany and France 
and his study there helped shape his own stylized form of realism. 

When he returned to Iowa, Wood applied the stylistic realism he had learned in Europe to 
the rural life he saw around him and that he remembered from his childhood around the 
turn of the century. His portraits of farm families imitate the static formalism of 
photographs of early settlers posed in front of their homes. His paintings of farmers at 
work, and of their tools and animals, demonstrate a serious respect for the life of the 
Midwestern United States. By the 1930's, Wood was a leading figure of the school of art 
called "American regionalism." 

In an effort to sustain a strong Midwestern artistic movement, Wood established an institute 
of Midwestern art in his home state. Although the institute failed, the paintings you are 
about to see preserve Wood's vision of pioneer farmers. 
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